Web Document Clustering Using KEA-Means Algorithm

نویسندگان

  • Swapnali Ware
  • Shen Huang
  • Zheng Chen
  • Yong Yu
چکیده

In most traditional techniques of document clustering, the number of total clusters is not known in advance and the cluster that contains the target information or précised information associated with the cluster cannot be determined. This problem solved by Kmeans algorithm. By providing the value of no. of cluster k. However, if the value of k is modified, the precision of each result is also changes. To solve this problem, this paper proposes a new clustering algorithm known as KEA-Means algorithm which will combines the kea i.e. key phrase extraction algorithm which returns several key phrases from the source documents by using some machine learning language by creating model which will contains some rule for generating the no. of clusters of the web documents from the dataset and the k-means algorithm. This algorithm will automatically generate the number of clusters at the run time. This KEAMeans clustering algorithm provides easy and efficient way to extract test documents from massive quantities of resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evaluation of Kea: An Automatic Keyphrase Extraction Algorithm

Keyphrases, often defined as keywords, are an important means of document summarization, searching, browsing, and clustering. This paper describes and evaluates Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate keyphrases using lexical methods, calculates TFIDF feature values for each candidate, and uses naïve Bayes learning scheme to predict keyphras...

متن کامل

A Comparison of Two Novel Algorithms for Clustering Web Documents

In this paper we investigate the clustering of web document collections using two variants of the popular kmeans clustering algorithm. The first variant is the global k-means method, which computes “good” initial cluster centers deterministically rather than relying on random initialization. The second variant allows for the use of graphs as fundamental representations of data items instead of ...

متن کامل

A Particle Swarm Optimization based fuzzy c means approach for efficient web document clustering

There is a need to organize a large set of documents into categories through clustering so as to facilitate searching and finding the relevant information on the web with large number of documents becomes easier and quicker. Hence we need more efficient clustering algorithms for organizing documents. Clustering on large text dataset can be effectively done using partitional clustering algorithm...

متن کامل

Document Clustering Using Semantic Cliques Aggregation

The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries; however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such ca...

متن کامل

A Hybrid Approach for Web Document Clustering Using K-means and Artificial Bee Colony Algorithm

Nowadays data growth is directly proportional to time and it is a major challenge to store the data in an organised fashion. Document clustering is the solution for organising relevant documents together. In this paper, a web clustering algorithm namely WDC-KABC is proposed to cluster the web documents effectively. The proposed algorithm uses the features of both K-means and Artificial Bee Colo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012